AITopics | evaluation measure

Toward Interpretable Evaluation Measures for Time Series Segmentation

Neural Information Processing SystemsJun-14-2026, 13:26:35 GMT

Time series segmentation is a fundamental task in analyzing temporal data across various domains, from human activity recognition to energy monitoring. While numerous state-of-the-art methods have been developed to tackle this problem, the evaluation of their performance remains critically limited. Existing measures predominantly focus on change point accuracy or rely on point-based measures such as Adjusted Rand Index (ARI), which fail to capture the quality of the detected segments, ignore the nature of errors, and offer limited interpretability. In this paper, we address these shortcomings by introducing two novel evaluation measures: WARI (Weighted Adjusted Rand Index), that accounts for the position of segmentation errors, and SMS (State Matching Score), a fine-grained measure that identifies and scores four fundamental types of segmentation errors while allowing error-specific weighting. We empirically validate WARI and SMS on synthetic and real-world benchmarks, showing that they not only provide a more accurate assessment of segmentation quality but also uncover insights, such as error provenance and type, that are inaccessible with traditional measures.

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.93)
Asia (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.69)
(4 more...)

Add feedback

The Elephant in the Room: Towards A Reliable Time-Series Anomaly Detection Benchmark

Neural Information Processing SystemsMar-22-2026, 09:39:34 GMT

Time-series anomaly detection is a fundamental task across scientific fields and industries. However, the field has long faced the ``elephant in the room:'' critical issues including flawed datasets, biased evaluation measures, and inconsistent benchmarking practices that have remained largely ignored and unaddressed. We introduce the TSB-AD to systematically tackle these issues in the following three aspects: (i) Dataset Integrity: with 1070 high-quality time series from a diverse collection of 40 datasets (doubling the size of the largest collection and four times the number of existing curated datasets), we provide the first large-scale, heterogeneous, meticulously curated dataset that combines the effort of human perception and model interpretation; (ii) Measure Reliability: by revealing issues and biases in evaluation measures, we identify the most reliable and accurate measure, namely, VUS-PR for anomaly detection in time series to address concerns from the community; and (iii) Comprehensive Benchmarking: with a broad spectrum of 40 detection algorithms, from statistical methods to the latest foundation models, we perform a comprehensive evaluation that includes a thorough hyperparameter tuning and a unified setup for a fair and reproducible comparison. Our findings challenge the conventional wisdom regarding the superiority of advanced neural network architectures, revealing that simpler architectures and statistical methods often yield better performance. The promising performance of neural networks on multivariate cases and foundation models on point anomalies highlights the need for further advancements in these methods.

artificial intelligence, data mining, machine learning, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.81)

Add feedback

c3f3c690b7a99fba16d0efd35cb83b2c-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 00:20:44 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Ohio (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Information Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Communications > Networks (1.00)
(5 more...)

Add feedback

9efe8db7fab57e19eed25718abedbbd2-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-16-2026, 04:53:40 GMT

artificial intelligence, library, machine learning, (14 more...)

Neural Information Processing Systems

Country: Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.06)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix overview This section provides an overview of the supplementary materials for our submission

Neural Information Processing SystemsFeb-11-2026, 09:27:59 GMT

We further focus on measuring the overlap between datasets. We check for the overlap on the level of systematic reviews based on the review's Cochrane ID.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.06)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction

Fujita, Aoi, Yamamoto, Taichi, Nakayama, Yuri, Kobayashi, Ryota

arXiv.org Artificial IntelligenceDec-9-2025

Rapid expansion of social media platforms such as X (formerly Twitter), Facebook, and Reddit has enabled large-scale analysis of public perceptions on diverse topics, including social issues, politics, natural disasters, and consumer sentiment. Topic modeling is a widely used approach for uncovering latent themes in text data, typically framed as an unsupervised classification task. However, traditional models, originally designed for longer and more formal documents, struggle with short social media posts due to limited co-occurrence statistics, fragmented semantics, inconsistent spelling, and informal language. To address these challenges, we propose a new method, TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction. Specifically, each text is embedded using Sentence-BERT (SBERT) and provisionally clustered using Gaussian Mixture Models (GMM). The clusters are then refined iteratively using a supervised projection based on linear discriminant analysis, followed by GMM-based clustering until convergence. Notably, our method operates directly on raw text, eliminating the need for preprocessing steps such as stop word removal. We evaluate our approach on four diverse datasets, 20News, AgNewsTitle, Reddit, and TweetTopic, each containing human-labeled topic information. Compared with seven baseline methods, including a recent SBERT-based method and a zero-shot generative AI method, our approach achieves the highest similarity to human-annotated topics, with significant improvements for both social media posts and online news articles. Additionally, qualitative analysis shows that our method produces more interpretable topics, highlighting its potential for applications in social media data and web content analytics.

computational linguistic, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2512.06694

Country:

North America > United States > California (0.67)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Media > News (0.90)
Health & Medicine (0.68)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

The Eigenvalues Entropy as a Classifier Evaluation Measure

Dembélé, Doulaye

arXiv.org Artificial IntelligenceNov-5-2025

Classification is a machine learning method used in many practical applications: text mining, handwritten character recognition, face recognition, pattern classification, scene labeling, computer vision, natural langage processing. A classifier prediction results and training set information are often used to get a contingency table which is used to quantify the method quality through an evaluation measure. Such measure, typically a numerical value, allows to choose a suitable method among several. Many evaluation measures available in the literature are less accurate for a dataset with imbalanced classes. In this paper, the eigenvalues entropy is used as an evaluation measure for a binary or a multi-class problem. For a binary problem, relations are given between the eigenvalues and some commonly used measures, the sensitivity, the specificity, the area under the operating receiver characteristic curve and the Gini index. A by-product result of this paper is an estimate of the confusion matrix to deal with the curse of the imbalanced classes. Various data examples are used to show the better performance of the proposed evaluation measure over the gold standard measures available in the literature.

artificial intelligence, eigenvalue, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.01904

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Toward Interpretable Evaluation Measures for Time Series Segmentation

Chavelli, Félix, Boniol, Paul, Thomazo, Michaël

arXiv.org Artificial IntelligenceOct-28-2025

Time series segmentation is a fundamental task in analyzing temporal data across various domains, from human activity recognition to energy monitoring. While numerous state-of-the-art methods have been developed to tackle this problem, the evaluation of their performance remains critically limited. Existing measures predominantly focus on change point accuracy or rely on point-based measures such as Adjusted Rand Index (ARI), which fail to capture the quality of the detected segments, ignore the nature of errors, and offer limited interpretability. In this paper, we address these shortcomings by introducing two novel evaluation measures: WARI (Weighted Adjusted Rand Index), that accounts for the position of segmentation errors, and SMS (State Matching Score), a fine-grained measure that identifies and scores four fundamental types of segmentation errors while allowing error-specific weighting. We empirically validate WARI and SMS on synthetic and real-world benchmarks, showing that they not only provide a more accurate assessment of segmentation quality but also uncover insights, such as error provenance and type, that are inaccessible with traditional measures.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.23261

Country: